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Abstract 

We study a simple adaptive model in the framework of an iV-player normal form game. The 
model consists of a repeated game where the players only know their own strategy space and their 
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own payoff scored at each stage. The information about the other agents (actions and payoffs) is 
unknown. In particular, we consider a variation of the procedure studied by Cominetti et al. [9] 
where, in our case, each player is allowed to use the number of times she has played each action to 
update her strategy. The resultant stochastic process is analyzed via the so-called ODE method 
from stochastic approximation theory. We are interested in the convergence of the process to 
, rest points of a related continuous dynamics. Results concerning almost sure convergence and 

■ convergence with positive probability are obtained and applied to a traffic game. Also, we provide 

some examples where convergence occurs with probability zero. Finally, we verify that part of 
\Q • the analysis holds when players are facing a random environment. 
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1 Introduction 



This paper studies an adaptive model for an iV-player repeated game. We consider boundedly 
rational players that adapt using simple behavioral rules based on past experiences. 

The decision that a player can make at each stage hinges on the amount of information available. 
Several approaches have been proposed depending on the information that agents can gather over 
time. Fictitious play (see Brown [7], Fudenberg and Levine [14]) is one of the best studied proce- 
dures. Players adapt their behavior by performing best responses to the opponent's average past 
play over time. In this case, each player needs to know her own payoff function and to receive 
complete information about the other players' moves. A less restrictive framework is when each 
player is informed of all the possible payoffs that she could have received by using alternative moves. 
The exponential procedure (Freund and Shapire [13]) is, among others, one example of this kind of 
adaptive processes. 
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We are interested in a less informative context. Players make no anticipation on the opponents' 
behavior and we assume that they have no information on the structure of the game. This means 
that agents have only their own space of actions and past realized payoffs to react to the environment. 
We suppose that players are given a rule of behavior (a decision rule) which depends on a state 
variable. The state variable is updated by a possibly time-dependent rule (an updating rule) based 
on the history of play and current observations. 

A widely studied model in this framework is the cumulative reinforcement learning procedure where 
players conserve a vector perception (the state variable), in which each coordinate of the vector 
represents how a move performs. The updating rule is defined by adding the payoff received to 
the component of the previous vector perception corresponding to the move actually played, and 
keeping the other components unaltered for the unused moves. The decision rule is given by the 
normalization of this perception vector assuming that payoffs are positive. Several results for the 
convergence (and nonconvergence) of players' mixed actions have been obtained (see Beggs [1], 
Borgers and Sarin [5], Laslier et al. [18], and a normalized version by Posch [22] for the 2-player 
game framework and Erev and Roth [12] for experimental results). In Cominetti et al. [9] the 
authors study a model in the same spirit mainly in the case of Logit rule decision (which allows 
nonpositive payoffs) in the iV-player case. Players update the perception vector by performing an 
average between the new payoff received and the former perception. Conditions are given to ensure 
the convergence to a Nash equilibrium of a perturbed version of the game. A similar model is 
studied by Leslie and Collins [19], where results concerning 2-player games are obtained. Another 
approach in this information framework is developed by Hart and Mas-Colell [16] where the analysis 
focuses on the convergence of the empirical frequency of play instead of the long-term behavior of 
the mixed strategy. Using techniques based on consistent procedures (see Hart and Mas-Colell [15]), 
it is shown that, for all games, the set of correlated equilibria is attained. 

We consider here a particular updating rule where players keep a perception vector that is updated, 
on the coordinate corresponding to the strategy played, by computing the average between the 
previous perception and the payoff received using the number of times that each strategy has been 
played. The resultant process turns out to be a variation of the one studied by Cominetti et al. [9], 
but in our case, players use more information on the history of play. Using the tools provided by the 
stochastic approximation theory (see e.g., Bena'im [3], Benveniste et al. [4], Kushner and Yin [17]), 
the asymptotic behavior of the process can be analyzed by studying a related continuous dynamics. 
We are interested in the case in which players use the Logit decison rule and our aim is to find 
general conditions to have almost sure, or with positive probability, convergence to an attractor of 
the associated ODE. This case is particularly interesting because the rest points of the ODE are 
the Nash equilibria of a related game. 

This paper is organized as follows. Section 2 describes the very basic aspects of the stochastic 
approximation theory. Section 3 states precisely our model in the framework of an infinitely repeated 
iV-player normal form game. In Section 4, we restate our algorithm to make it fit in the set-up of the 
stochastic approximation and we provide a general almost sure convergence result. In Section 5.1 
we treat the case of the Logit rule in detail. We start by finding an explicit condition to ensure 
almost sure convergence derived from Section 4. This condition demands the smoothing parameters 
associated to the Logit rule to be sufficiently small. It is worth noting that, up to this point, we 
proved that the same results obtained for the process studied by Cominetti et al. [9] hold in our 
setting. Given this fact, we perform a comparison between these two processes in terms of the 
path-wise rate of convergence. Later, under a weaker assumption, we study the convergence with 
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positive probability to attractors. We apply this result to a particular traffic game on a simple 
network (studied as an application in [9]) showing that a convergence with positive probability 
property holds under a much weaker assumption than in the general case. Next, we provide some 
examples where the convergence is lost. Finally, in Section 6, we prove that part of the analysis can 
be recovered if players face random payoffs. For that purpose, we use some known techniques that, 
to our knowledge, do not seem very exploited in the framework of learning in games. 

2 Preliminaries 

In this section, we revisit some basic aspects of the stochastic approximation theory following the 
approach in Benai'm [3]. The motivation is to study the following discrete process in M. d 

Zn+l ~ z n = j n+ i(H(z n ) + V n+ l), (2.1) 

where (7 n )n is a nonnegative step-size sequence, H : W 1 — > W 1 is a continuous function and (V n ) n is 
a (deterministic or random) noise. Let us denote by C{z n ) the limit set of the sequence {z n ) n , i.e., 
the set of points z such that lim/^ +00 z ni = z for some sequence n/ — > +oo. 

The connection between the asymptotic behavior of the discrete process (2.1) and the asymptotic 
behavior of the continuous dynamics 

z = H(z) (2.2) 

is obtained as follows. Given e > 0, T > 0, a set Z C ]R rf and two points x,y £ Z, we say that there 
is an (e,T)-chain in Z between x and y if there exist k solutions of (2.2) {xi, . . . , x^} and times 
{t\, . . . , tk} greater than T such that 

(1) Xi([0,ti])QZfora\lie{l,...,k}, 

(2) \\xi(U) - x m (0)|| < e for alH G {1, . . . , k - 1}, 

(3) ||xi(0) — x|| < e and Hx^t^) — y\\ < e. 

Definition 2.1. A set D C M. d is Internally Chain Transitive (ICT) for the dynamics (2.2) if it is 
compact and for all e > 0, T > and x,y G D there exists an (e,T)-chain in D between x and y. 

This definition is derived from the notion of Internally Chain Recurrent sets introduced by Con- 
ley [10]. Roughly speaking, on an ICT set, we can link any two points by a chain of solutions of the 
dynamics (2.2) by allowing small perturbations. ICT sets are compact, invariant and attractor-free. 
In Benai'm [3] the following general theorem is proved. 

Theorem 2.2. Consider the discrete process (2.1). Assume that H is a Lip schitz function and that 

(a) the sequence {j n ) is deterministic, j n > 0, X^ra7« = +°° an ^ 7« ~~ ^ 0' 

(b) sup||z n || < +oo ; and 

nGN 

(c) for any T > 

n , 

■ke{n + l,...,m(J2^ + T )} \ = ' 



lim sup 



fc-i 



+i 
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where m(t) is the largest integer I such that t > X) 7j- Then C{z n ) is an ICT set for the dynamics 

3=1 

(2.2). 

Remark 2.3. In the case where the noise (V n ) n in (2.1) is a martingale difference sequence with 
respect to some filtration on a probability space, we say that (2.1) is a Robbins-Monro [23] algorithm. 
In this framework if, for instance, sup n E(||y n || 2 ) < +oo and {^ n )n G Z 2 (N) then assumption (c) in 
Theorem 2.2 holds with probability one (see Bena'im [3, Proposition 4.2]). Moreover this result is 
still valid if the noise can be decomposed into a martingale difference process plus a random variable 
that converges to zero almost surely. 



3 The model 

An iV-player normal form game is introduced as follows. Let A = {1, 2, . . . , TV} be the set of 
players. For every i G A let S l be the finite strategy set for player i and let the set A* = {z G 
Mj s ^; z % > 0, Y^i z% = 1} denote her mixed strategy set. S = HieA ^ * s ^ ne se ^ °^ strategy profiles 
and A = Y\ ieA A* is the set of mixed strategy profiles. We write as (s, s~ l ) G S the strategy profile 
where player i uses the strategy s G S l and her opponents use the strategy profile s~' 1 G Ylj^i 
and we adopt the same notation when a mixed strategy profile is involved. The payoff function to 
each player i G A is denoted by G l : S — > R as well as its multilinear extension G % : A — > M. 

The game is repeated infinitely and we suppose that players are not informed about the structure of 
the game, i.e., neither the number of players (or their strategies) nor the payoff functions are known. 
At the stage n G N, each player i selects a strategy s l n G S l using the mixed strategy o l n G A 1 . 
Then, she gets her own payoff g % n = G l (s l n , s~ l ) and this is the only information she receives. 

For every n G N and for each player i, we assume that the mixed strategy at stage n , a l n G A*, 
is determined as a function of a previous perception vector x l n __ x G IR' 5 "', i.e., a l n = cr*(3;^_ 1 ) with 
a i ■ n^l 51 ! — > A*. The state space for the perception vector profiles x = (x 1 , . . . ,x N ) G J^ igj4 lR' 5! ' is 
denoted X. We also assume that, for every i G A, 



the function a 1 : M} s ' — > A 1 is continuous, and 
for all s G S i and x l G M |S \ o^ix?) > 0. 



(A) 



We refer to the function a : X — > A with o~(x) = (cr 1 ^ 1 ), . . . , o~ (x )) as the decision rule of the 
players. 

At the end of the stage n, each player i uses the value g l n and x\_ x to obtain the new perception 
vector x l n , and so on. The manner in which x n is updated is called the updating rule of the players. 

Cominetti et al. [9] study the following updating rule 

x is i= l( 1 ~ 7n+i)4 s + In+wi+i, if s = 4+1 1 (3 x n 

U+ \ x ni otherwise, 

where we assume that j n = — (see Remark 4.3 for an explanation on this choice). 

In this paper we consider a variation of (3.1). Players will use more information by taking into 
account the number of times that their actions have been played. Explicitly, we define the adjusted 
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process (AP) by 

1 x .-. 1 



n+1 



otherwise, 



where 0Jf denotes the number of times that strategy s has been used by player i £ A up to time 
n. Given the particular structure in (AP), we can suppose that x n lies in a compact subset of X 
for all n G N. We also note that (A) implies that the decision rule can be assumed component- wise 
bounded away from zero. 

For feasibility issues we suppose that 6jf > for all i G A and s £ S l and then 9™ is not exactly 
the number of times that the action s G S % has been used by player i (since it contains the initial 
condition), but we keep this interpretation. This assumption is not relevant in the asymptotic 
analysis developed further on. 

As usual we denote by T n the a-algebra generated by the history up to time n, T n = 
&{(s m ,gm)i<m<n), where s m = {s l m ,... ,s%) and g m = (g^, ...,g%). 



4 Asymptotic analysis 

The main difficulty to analyze (AP) using the tools decribed in Section 2, is that we have a stochastic 
algorithm in discrete time where the step-size is random and, moreover, depends on the coordinates 
of the vector to update. Thus, in order to study the asymptotic properties of our adaptive process, 
let us restate the updating scheme (AP) in the following manner 

x n+l ~ x n = ateidn+l ~ x n )^-{s=s i n+1 } 
°n 

1 . is 

= (n + l)Ajf " ^ S)1 { s =<+i}' 

Qis 

where AJf = is interpreted as the empirical frequency of action s for player i up to time n and 1c* 
stands for the indicator function of the set C. Without loss of generality, assume that < 0™ < 1, 
for every i G A and s £ S 4 , in order to have A* G A* for al n G N. Standard computations involving 
averages show that 

A" + i - = n (l {s= ^ +i} - \™ + b n+1 ), 

and 

b ^ = -^(hs=< +l} -K) = o[l). (4.1) 

Then we can express (AP) differently by introducing the empirical frequency of play. The new form 
is the (up to a vanishing term) martingale difference scheme 

4%1 - X n = [^^( Gl ( S > ~ X n) + K\l] , 

(APD) 

A" +1 - = [a ls (x n ) - X % n + M n s +1 ] , 

V 71 -\- 1 
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where the noise terms are explicitly 

1 



U, 



"•+1 ~~ vis Wi+l X n)^ 



g"(4) 
\is 



Tij(5n+1 - X n) 1 {s=sl l+1 } ~ E (-wj(fn+l ~ X n ) 1 {s=sj l+1 } I Fn ) ' 



(4.2) 



M; s +1 = l {s=sl+i} -a is (4) + 6 n+1 , 

= (l{s= s j i+1 } - A^f ) - E((l| s= ^ +i | - A^f ) | J" n ) + 6 n+ i. 

From now on, we denote by e n = (U n ,M n ) the noise term associated to our process. 

The scheme (APD) will allow us to deal with the random (and player-dependent) character of the 
step-size in (AP). Now, in the spirit of Theorem 2.2, the asymptotic behavior of (APD) is related 
with the continuous dynamics 



cr ls {x\) 



G i ( S ,a" i (x t ))-xj 



\f = a is (x\) - A 



(4.3) 



with W x : A x A — > Y\ieA^ S ^ anc ^ : A x A — > YlieA^O^ anc ^ standing for the tangent 
space to A*, i.e., Aq = {z £ M} s% \; X^eS 4 zS = Let us denote the function defined by 
*(x,A) = (* (B (a:,A),* A (x,A)). 



For the sake of completeness, let us write the process (3.1) as 

4+1 ~ x n = n _i_i [ ff,5 ( x ri)(^ , ( s i a % ( x n)) ~ 4*) + ^n+l] > 

with noise term given by 

Un+l = (dn+l — x n)^ L {s=s i n+1 } ~ <T " (4) \G % (s, G n % (x n )) — X™), 

= (5n+i - <)l {s=<+i} - E((^+i - x J „ s )l {s=<+i} | F n ). 
Therefore, the corresponding continuous dynamics is given by 

xf = a is (xi)(G i (s,a- i (x t ))-xl s ) = & s (x t ), 

where $ : X — > ILeA 11 * 15 ' 1 - 

Remark 4.1. Observe that the following simple fact holds 

(x,a(x)) G A x A is a rest point of (4.3) 44> x G A is a rest point of (4.6). 



(4.4) 



(4.5) 



(4.6) 



In the sequel we will show that asymptotic properties similar to those of (4.4) can be obtained for 
our process. This means that explicit conditions can be found to ensure that the process (APD) 
converges almost surely to a global attractor for the dynamics (4.3). 

Recall that we have assumed that, for every n £ N and i £ A, the mixed action a % n £ A 1 is 
component-wise bounded away from zero. The purpose of the next simple lemma is to verify that 
the same holds, almost surely, for the empirical frequencies of play. 
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Lemma 4.2. For n > 1, let o~ n be a probability distribution over a finite set T and let i n +i be 
an element of T which is drawn with law a n and assume (o~ n ) n is adapted to the natural filtration 
generated by the history. For all j £ T, set 

n 
p=l 

where (j n ) n is a decreasing positive real sequence such that ^ n 7„ = +00 and Yln^n < +00. 
Assume that there exists a > such that a J n > a. Then 

lim inf > a, 

n— >+oo 

almost surely, for every j € T. 



Proof. Fix j G T and let Fk be the cr-algebra generated by the history . . . ,ik} up to time k. 
Then we have that E(l{j ft= j| | J~k-i) = > o\ On the other hand the random process (0n)n 
given by 

n 

<ti = j2 n fk[t { i k = j} -m{i k = j }\J r k-i)) 

k=l 

is a martingale and sup ngN (<fin) 2 < C- X) p >i 7p < +°° f° r some constant C. Hence (4>n) n converges 
almost surely. Now Kronecker's lemma (see e.g., Shiryaev [25, Lemma IV. 3. 2]) gives that 

n 

J&o-y- E - E (%*=;} 1 ^0) = 0. (4.7) 
fc=i 

So that 7 n Sfc=i(^{ifc=i} ~~ ^(-"-{ifc=j} | Tk-i)) < Xn — a. Taking liminf and using (4.7) we conclude. 

□ 

Remark 4.3. Observe that we have defined an adjusted version of (4.4) with 7„ = -. The 
previous lemma shows that all the analysis in this section can be carried out for more general step- 
size sequences (jn)n by taking AJf = O^^fn- We adopt 7 n = 4 to keep the convenient interpretation 
in terms of frequencies of play. 

Proposition 4.4. The process (APD) converges almost surely to an ICT set for the continuous 
dynamics (4.3). 



Proof. We only have to show that our process satisfies the hypotheses of Theorem 2.2. The assump- 
tions concerning the regularity of the function involved, the step-size sequence and the boundedness 
of the process (x n , X n ) n hold immediately. 

According to (4.2), {M n ) n is almost surely bounded and can be written as a martingale difference 
scheme plus a vanishing term. Observe that K(U n +i \ T n ) = and that 

for some constant C. Then Lemma 4.2 implies that U n is almost surely bounded. In view of Re- 
mark 2.3, assumption (c) of Theorem 2.2 holds for the noise term € n — (U n , M n ) and the conclusion 
follows. □ 
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Let us define the function F : X — > f\i R' 5 ' by 

F is (x) = G i {s,a- i (x)). (4.8) 

Cominetti et al. [9] show that if the function F is contracting for the infinity norm, then the process 
(4.4) converges almost surely to the unique rest point of the dynamics (4.6). The following result 
shows that the same holds for the process (APD) by adding a slighlty stronger assumption on the 
decision rule a. 

Proposition 4.5. Assume that F is contracting for the infinity norm and that, for every i G A, the 
function a 1 is Lipschitz for the infinity norm. Then there exists a unique rest point (x*, cr(x*)) G 
X x A of (4.3). Furthermore, the set {(x*, <r(x*))} is a global attractor and the process (APD) 
converges almost surely to (a;*, <r(x*)). 



V(x,X) = max< \\x — x*^ , - ||A — A*| 



Proof. According to Remark 4.1, (ar*,cr(x*)) G X x A is a rest point of (4.3) if and only if F(x*) = x*, 
hence the existence and uniqueness follow from the fact that F is contracting. 

Let < L < 1 and K{ be the Lipstchitz constants associated to the functions F and a 1 , i G A, 
respectively. We want to find a suitable strict Lyapunov function, i.e., a function V that decreases 
along the solution paths and that verifies V -1 ({0}) = {(x*, A*)} with A* = o"(x*). Let F:IxA~> 
R+ be defined by 

1 

C 

where £ > will be defined later on. The function V is the maximum of a finite number of 
smooth functions, therefore it is absolutely continuous and its derivatives are the evaluation of the 
derivatives of the function attaining the maximum. We distinguish two cases: 

Case 1. V(xt,Xt) = \\xt — x*^. Let i G A and s G S l be such that V(xt,Xt) = |x| s — x* s |. Let us 
assume that x\ s — x* s > 0. Then, for almost all t G R, 

= -^-il(F ls (x t ) - F-(x*) + x\ s - x\ s ) 

< IK -»*lloo 

= -Z(l-L)V(x t ,Xt), 

for some £ > such that a ts (x) > £ for every i G A and s G S l . If xj s — x\ s < 0, the computations 
are analogous. 

Case 2.: V(x t ,X t ) = | ||A t - x*]^. Let i G A and s 6 S 1 ' be such that V(x t , A t ) = ^|Af" - Ai r |. We 
also assume that A* — A* r > 0. Then, for almost all t G R, 

jV(x t , X t ) = i [o*-(4) - <r^(x*) + Af - Af ] 

<-Jl|A t -A # || TC + iK>(4)-^ r (^)l 
maxj iTj 

< -K(x t , A t ) H \\x t - x*||oo 

maxj ilj 
= -(1 > )^(x i; A t ), 
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and we take £ > sufficiently large to have 1 > maxj.fQ/£. Again, if the relation Xj.' — X J J < 
holds, the computations are the same. 

Hence V(xt,\t) < — KV(xt, Xt) for some K > 0. So V decreases exponentially fast along the 
solution paths of the dynamics and V(x,X) = if and only if (x,X) = (x*,A#). Therefore the 
set {(:r*,A*)} is a global attractor which is the unique ICT set for (4.3) (see [3, Corollary 5.4]). 
Proposition 4.4 finishes the proof. □ 



5 Logit rule 

In this section we will focus on the analysis of a particular decision rule: the Logit rule which has 
a large foundation in the field of discrete choice models as well as in game theory. Explicitly the 
decision rule a : X — >■ A is given by 

**V) = v exp(/3 ;f t v (5-i) 

22 exp(/3jx ir ) 

for every i € A and s £ S l , where > is called the smoothing parameter for player i. According 
to Remark 4.1, the following result shows that the rest points of the dynamics (4.3) are the Nash 
equilibria for an entropy perturbed version of the original game. 

Lemma 5.1 (Cominetti et al. [9]). Under the Logit decision rule (5.1), if x S X is a rest point of 
the dynamics (4.6), then o~(x) is a Nash equilibrium of a game where the strategy set for the each 
player i is A* and her payoff G % : A — > R is given by 

5*00 = £ tW^tt- 4 ) ^ s (m(0 - 1). (5.2) 



5.1 Almost sure convergence 

We want to apply Proposition 4.5 in this particular framework. For that purpose, let us introduce 
the maximal unilateral deviation payoff that a single player can face, 

7?= max 1(^(5, n)- G\s,r 2 )\, (5.3) 

i€A,s&S l 

where S~ l = {(n,^) G S~ l x S~ l ;r\ ^ r\ for exactly one k}. Now we can show the following 
proposition that ensures that, if the parameters are sufficiently small, the unique attractor is attained 
with probability one. From now on, we denote a = maxjgA Ylj^i Pj- 

Proposition 5.2. // 2rja < 1, the discrete process (APD) converges almost surely to the unique 
rest point (x*,<r(x*)) of the dynamics (4.3). 

Proof. We know from Cominetti et al. [9, Proposition 5] that, if 2r/a < 1, the function F (defined 
in (4.8)) is contracting for the infinity norm. Observe also that, for every i £ A, the function a 1 is 
Lipschitz for the infinity norm, since it is a smooth function defined on a compact set. Therefore, 
Proposition 4.5 applies. □ 
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Rate of Convergence 



Up to this point, we were able to reproduce some of the theoretical results of the original model (4.4) 
regarding its almost sure convergence to global attractors. Now, we want to justify the inclusion of 
a counter of the previous actions in terms of the rate of convergence when both learning processes 
(APD) and (4.4) converge almost surely to (x*,A*) and x*, respectively, and the step-size ^ n = \ 
is considered. This rate of convergence is deeply linked with the largest real part eigenvalue of the 
Jacobian matrix of the functions \& = (^ x , an d 3> at the respective rest points. 

Let us denote p(B) the maximal real part of the eigenvalues of a matrix B £ l ixfc , i.e., 

p(B) = max{Re(/ij); j = 1, . . . , k, where pj S C is an eigenvalue of the matrix £>}. 
We say that a matrix B is stable if p{B) < 0. 

Lemma 5.3. Assume that 2na < 1. Let (x*, A*) and x* be the unique rest points of the dynamics 
(4.3) and (4.6), respectively. Then 

1 N 

- 1 < p(Vtf (x„ A*)) < -- < - „ < p(V$(x*)) < 0. (5.4) 



Proof. Straigthforward computations concerning the function ^ (see (4.3)) show that 



and A 



(x* , A* ) 



{is=jr}i 



for every i,j £i and (s,r) 6 S l x S J . Therefore, the matrix V\P(x*,A*) looks like 



V#(x*,A*) 



L 



(5.5) 



where / stands for the identity matrix and V x *$> x (x*, A*) denotes the Jacobian matrix of *$> x with 
respect to x at (x#, A*). Notice that the interesting eigenvalues of this matrix are given by its upper- 
left block because of the zero block and the identity matrix on the right side in (5.5). Observe also 
-1, i.e., the matrix V^^x*, A*) has diagonal terms equal to —1. 



that Q x i S (x*, A*) 



On the other hand, we know that every eigenvalue of a complex matrix B = (B pq ) lies within at 
least one of the Gershgorin discs D p (B) = {z G C, 



B 



pp\ 



< Rp} where R p = Y, q ^ P l^wl- Given 
the specific form of the matrix V x fy(x*, A*) we can estimate the position of its eigenvalues. So, in 
our case, 



Ri. 



EE 



- (x* , A* ) 



since ^^(x*, A*) = if i = j and r ^ s. This follows from the fact that F ts [x) (defined in (4.8)) 
is independent of the vector x\ Explicitly, 



dx^ 



(x*, A,) = ftof [G^s, r, a: {i ' j) ) - G\s, , 
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where 

a j =r kytj 

for i ^ j. So that 

^ = E& E ^IffCa.r,^) - G*(W)| 
jeA reSi 

< rja. 

Then we have that all the eigenvalues of the matrix V x ty x (x*, A*) are contained in the complex disc 

{z G C, \z + 1| < r,a} D \J D is (V x V x (x*, A*)), (5.6) 

ieA 

which implies that p(V^>(x*, A*)) < —1/2. 

Analogous computations involving the function <3? show that 

A s (V$(x*)) C {z G C, |z + <4 S | < ai s V a}, 

for every i £ A and s G Since — a™ + afrja < 0, then p(V3>(a;*)) < 0. 

The fact that -1 < /o(V*(x*, A*)) is evident. Finally, the inequality -AT/^ < p(V<&(x*)) 
follows since the trace of the matrix V$(x*) is equal to — N. □ 

Remark 5.4. Notice that 1/2 = N/ Y^ k I 5 *! if and onl y if 1^1 = 2 for all k G A. 

The following reduced version of Chen [8, Theorem 3.1.1] will be useful. 

Theorem 5.5. Consider the discrete process given by (2.1). Assume that the following hold. 

(a) For every n G N, 7„ > 0, lim n ^ +OC) j n = 0, ^ n In = +oo and 

lim 7 "~ 7 " +1 =7>0. 

n->+oo 7„ + i7 n 

(6) z n — > zq almost surely. 

(c) There exists 5 G (0, 1] such that 

(c.l) for a path such that z n — >• zq, the noise V n can be decomposed inV n = V/ l + V" where 

E ^K+i < +oo and V: = 0( 7 «), 

n>l 

(c.2) the function H is locally bounded and is differentiable at zq such that H(z) = H(z — zq) + 
r(z) where r(zo) = and r(z) = o(\\z — zo\\) as z — > zq and 

{c.2) the matrix H is stable and, furthermore, H + S^yl is also stable. 

11 



Then, almost surely, 

(l/7n) 5 0n - Zq) 0, SS J14 +OO. 

The previous result allows us to show that, in some sense, our algorithm is faster. This will show 
that, under the common hypothesis 2na < 1 (which ensures almost sure convergence for both 
processes), the employment of the adjusted process (APD) will help the players to adapt faster 
their behavior than the original process (4.4). 

Proposition 5.6. Assume that 2na < 1 and let (x*,A*) € X x A and x* E X be the unique rest 
points of the dynamics (4.3) and (4.6), respectively. Then the following estimates hold 

(i) for almost all trajectories of (4.4) 

n s (x n — x*) — > 0, as n — > +oo, 

for every 5 £ (0, \p(V&(x*))\), 
(ii) for almost all trajectories of (APD) 

n 5 ((x ni \ n ) — (x*, A*)) — > 0, as n — > +oo, 

for every 5 £ (0, 1/2). 

Proof. Recall that e n = (U n ,M n ) and U n are the noise terms associated to (APD) and (4.4), 
respectively (see (4.2) and (4.5)). We observe that, for both processes, hypotheses (a) and (b) in 
Theorem 5.5 are immediately satisfied since j n = -, (with 7=1) and since Proposition 5.2 applies. 
Let us verify that condition (c) holds. 

(i) Fix 5 £ (0, \p(V&(x *)){). The random process (U n ) n is almost surely bounded and satisfies that 
K(U n+ i I F n ) = 0. Therefore, Z n = X}fc=i(lA) 1_<5 ^fc+i i s a martingale where sup n ||^n|| 2 < 
^fe^i(l/^) 2 *' 1<5 ^ < +00, thus convergent (since 5 < 1/2). To conclude, observe that the 
function $ is smooth and that the matrix V^x*) + 51 is stable. 

(ii) Fix 5 S (0, 1/2). We repeat the argument by noticing that e n = e n + b n where b n = 0(l/n) 
and E(e n+ i | F n ) = 0. To finish, we use the fact that the matrix V 1 I / (x*, A*) + 51 is stable 
since inequality (5.4) holds. 

□ 

Remark 5.7. Two important comments are in order. 

(a) For the process (APD), if the matrix C n = E(e^ +1 e n +i | F n ) converges almost surely to a de- 
terministic positive definite matrix C, then T/n((x n , A n ) — (x*, A*)) converges in distribution to 
a normal random variable (see e.g., Benveniste et al. [4] or Kushner and Yin [17]). For the pro- 
cess (4.4), if the sequence C n = E(C/J +1 C/ n+ i | F n ) converges almost surely to a deterministic 
positive definite matrix C, then it can be shown (see Duflo [11]) that (x n — x*) con- 

verges almost surely to a finite random variable. For instance, if we consider the game defined 
by (5.7), we can show that both sequences (C n ) n and (C n ) n converge almost surely to a deter- 
ministic positive definite matrix and that [^(V^x*))! < 1/2. Therefore, in general, nothing 
more can be said when the step-size 7„ = ^ is considered. Figure 1 consists in the results of 
a numerical experience in this particular example where 2ija = 0.8 and |/o(V<I>(x*))| ~ 0.3. 
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(6) Observe that a better rate can be achieved for the process (4.4) if the step-size is given by 
7 n = - for a > \p(V^/(x^,))\. This leads to the rate o(n~' 5 ) for all 5 G (0,1/2). However it is 
fairly unrealistic to assume that the players know in advance this information. Nevertheless, 
we always have that \p(V&(x*))\ < |p(V^ / (x*, A*))! and then the scheme (APD) can reach 
at least the same path-wise rate of convergence under the hypotheses of Proposition 5.6 and 
independent of the step-size considered. 

/(0,0) (1,0) (0,1)\ 

(0,1) (0,0) (1,0) (5.7) 

\(1,0) (0,1) (0,0)/ 

0.03 



0.025 
0.02 

0.015 
0.01 

0.005 





5000 10000 15000 20000 25000 

Figure 1: \\(x n , X n ) — (x*, A*)|| 2 versus \\x n — x*|| 2 . 



5.2 Convergence with positive probability 

The estimates given by Lemma 5.3 allows us to improve the range of parameters in which general 
convergence results can be obtained for the process (APD). We start by showing that there exists 
a unique rest point of (4.3) which is stable if 1 < 2r/a < 2. Let y C X x A be the set of rest points 
of (4.3) and let B{A) be the basin of attraction corresponding to the attractor A. 

Proposition 5.8. Assume that 1 < 2r\a < 2. Then, there exists a unique rest point (x*, A*) for the 
dynamics (4.3) which is an attractor. 

Proof. Let (x*, A*) E y. If 1 < 2-qa < 2, equation (5.6) shows that the matrix V^(x*, A*) is stable. 
To prove that {(x*, A*)} is an attractor, take V(x, A) = ((x, A) — (x*, \*)) T D((x, A) — (x*, A*)) as a 
(local) Lyapunov function where, for instance, D is the positive definite solution of the Lyapunov 
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equation V^(a;#,A*) D + DVWfa;,, A t ) = —I. Given the fact that basins of attraction cannot 
overlap, the set y is finite since X x A is compact and Vl/ is regular. Finally, y reduces to one point 
since, in this case, it is impossible to have finitely many stable equilibibria due to the Poincare-Hopf 
Theorem (see e.g., Milnor [20, Chapter 6]). □ 



The following definition is crucial to ensure convergence with positive probability of the process 
(x n , \ n )n to a given (not necessarily global) attractor. 

Definition 5.9. Let (z n ) n be a discrete stochastic process with state space Z. A point z G Z is 
attainable by (z n ) n if for each m G N and every open neighborhood U of z, P(3n > m, z n G U) > 0. 



The following lemma uses strongly the particular form of the updating rule (AP) considered in this 
work. 

Lemma 5.10. Fix A = (A 1 , ... , \ N ) G A. Set x i G R^l such that x is = G*(a, A _i ) for all s G S i 
and put x = (x l , . . . , x N ) G X. Then, the point (x, A) G X x A is attainable by the process (x n , X n ) n . 
In particular, any rest point of the dynamics (4.3) is attainable. 



Proof. The fact that a™ > £ > for every i G A, s G S l and n G N implies that any finite 
sequence generated by (APD) has positive probability. The updating rule (AP) can be expressed 
as (including only for this time the initial conditions) 



where v ts (k) = 'm£{q > 1, 6 l q s = k}, i.e., the stage when player i has played strategy s G S 1, for the 
k-th time. Observe that, without loss of generality, we can suppose that m = in the definition of 
attainability given the particular form of the updating rule (5.8). 

Let Cn De the number of times that the strategy profile s G S has been played up to time n. Hence, 
for every i G A and s G S l , (5.8) implies that 

A( s > r ) _ia 
n+i / j \ ' > ais \ ais 



res- 



'n ' "n T u 



= E ^(,,r)^- + 6 ri ,, 
res-* n 

with b n = 0((^ s ) _1 ). Observe that 0™ —> +oo almost surely due to the conditional Borel-Cantelli 
lemma. Fix e > and let n be an integer such that k\ = nk\ G N, where, for every i G A and 
s G S l , k\ denotes a rational number satisfying that \X S — k\\ < e. For a strategy profile s G S, let 
us define the positive integers n s = Yl ieA fcL and n = Ylses n s- Now we take the sequence generated 
by (APD) defined by I G N blocks of size n where within each block, each s G S is played exactly 
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n s times, regardless the order of play. Fix i £ A and r £ S l , so that, by construction 



°m k >: 1 1,.-,/' 

I I ; - ' Ki 



Yij^i ( X)j>/=1 ^Pj) 

where 6 £ — )• as e — )• 0. Finally, given e' > 0, take Z large and e small to have 
\\(xi n+1 , \i n+ i) - (x, A) || < e'. □ 

Recall that is the limit set of the sequence (z n ) n . The following result is the goal of this 

subsection. 

Proposition 5.11. If an attractor A for the dynamics (4.3) satisfies that B{A) n y 7^ 0, then 
P(£(x n , A n ) C A) > 0. /n particular, under the Logit decision rule decision (5.1), if 1 < 2rja < 2, 
then y reduces to one point (ar*,A*) and P((x n , A n ) — )• (x*,A*)) > 0. 

Before providing the proof we need to briefly introduce the following concepts. Let (j> be the semi-flow 
induced by the differential equation (4.3) and let Y t the continuous time affine process associated 
to the discrete process (x n ,X n ) n , i.e., 

•tri . \ 1 \ \ 1 ( x n+li A n +i) — (x n ,A n ) . . 

Y(T n + u) = {x n ,X n ) +u , (5.9) 

for all n G N and u G [0, ^tj), where r n = X^m=i m - (•^*)t>o De the natural associated filtration. 

The following technical lemma is needed. We only provide an outline of the proof because we follow 
exactly the lines in Benai'm [3, Proposition 4.1] along with the explicit computations provided in 
the proof of Schreiber [24, Theorem 2.6]. 

Lemma 5.12. For all T > and 5 > 0, 

C(6,T) 



sup[ sup \\Y(u + h)- (j) h (Y(u))\\ ]>5\T t )< 

u>t 0<h<T J 



exp(ct) ' 

for some positive constants c and C(5,T) when t > is large enough. 

Outline of the proof. Roughly speaking, the process (APD) can be written as 

(x n+ i,A n+ i) - (x n ,X n ) = — — (*(zr n ,A n ) + e n+ i +b n+1 ), 

n + 1 

where (e n ) n is almost surely bounded, E(e n+ i | Tn) = and b n = 0(l/n). Recall that m(t) is the 
largest integer I such that t > ^ Then from Benai'm [3, Proposition 4.1] we have that 

sup \\Y(u + h)- MY(u))\\ < C{T) + A(u, T)) , 

o<h<T \m(u) J 
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for u sufficiently large, where 



A(u, T) = sup 

0<h<T 



m(u+h)—l 

£ 

l=m(u) 



+ b, 



■l+i 



l + l 



To conclude, it is enough to follow word by word, in this simpler case, the lines in the proof of 
Schreiber [24, Theorem 2.6]. □ 



Proof of Proposition 5.11. In view of Proposition 5.8 and Lemmas 5.10 and 5.12 the result follows 
directly from Benai'm [3, Theorem 7.3]. □ 

Remark 5.13. Notice that for Lemma 5.10 and for the first part of the statement in Proposi- 
tion 5.11, we have only assumed the condition (A) on the decision rule a. Furthermore, from 
Benai'm [3, Theorem 7.3], we have the following estimate for the probability of convergence to an 
attractor A. If the set U C X x A is such that U C B(A), then there exists numbers T, S > 0, 
depending on U so that 

P(£(zn,A n ) QA)>(l- ^^V(3u > t,Y(u) G U), 
V exp(ct) J 



for all t > 0, where the constants C(5,T) and c are given by Lemma 5.12. 



A traffic game 

The (almost sure or with positive probability) convergence to attractors results obtained when the 
Logit decision rule is considered are valid under the strong assumption 2i]a < 2. In fact, this condi- 
tion becomes very difficult to verify as the number of players increases. Moreover, nonconvergence 
can occur for some games (see Section 5.3 for details) if the parameter -qa is large. In this part, we 
will discuss the interesting application developed in Cominetti et al. [9, Section 3] and we will show 
that a result in the spirit of Proposition 5.11 can be obtained under a much weaker condition. 

Consider a network with a topology that consists on a set of parallel routes. Each route r 6 1Z in 
the network is characterized by an increasing sequence of values c\ < ■ ■ ■ < c r N where c r u represents 
the average travel time when r carries a load of u users. 

The traffic game is defined as follows. The strategy set is common for all players, i.e., S l = 1Z, for 
every i S A with 1Z the set of available routes. The payoff to each player i, when the strategy profile 
r £ 1Z is played (i.e., when the network is loaded by the configuration r), is given by the value 
— c^ 1 = G i (r), that is, minus her travel time. 

This traffic game is shown to be a potential game in the sense that there exists a function A : 
[0, l] Nx \ n \ ^Msuch that 

^(X) = G i (s,X- i ), 
for every A £ A. Explicitly, the function A is given by 

u r 

A(vr) = -E 7r [^^<], (5.10) 

reTZu=l 
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where the expectation is taken with respect to the random variables U r = X^eA^"" with X tr 
independent Bernouilli variables such that ¥(X tr = 1) = ir tT . It is also shown that the second 
derivatives of A are zero except for 

°' A = M c ur.+i ~ chr +2) 6 [-7?,0], (5.11) 



ijLj, where 1% = £ Mm -A^. 

Recall that we are assuming that player use the Logit decision rule in the learning process (APD). 
We also suppose that the smoothing parameters are identical for all players, i.e., = /3 for every 
i G A. Notice that, in this framework, the value 77 (defined in (5.3)) translates to 

rj = max{r?; ;r£K, 2 < u < N} = max{ C ; - c r u _ x ; r G K, 2 < u < N}. (5.12) 

Cominetti et al. [9] obtain the following (among others) result. Recall that a = maxj G ,4 X^yi Pj = 
(N-1)P. 

Proposition 5.14. The following hold 

(i) if 2r]a < A, the function F is contracting for the infinity norm and the process (4.4) converges 
almost surely to the unique rest point of (4.6), 

(m) if 7][3 < 1, (4.6) has a unique rest point x* G X which is symmetric in the sense that x* = 
(x,...,x). Furthermore, {x*} is an attractor for (4.6). 

Notice that the first part in the proposition above can be recovered for the process (APD) since 
Proposition 5.2 applies. The second part provides a much weaker condition to have existence and 
uniqueness of a rest point of (4.6) (or, equivalently, a Nash equilibrium of the perturbed game 
defined in Lemma 5.1). Observe also that, despite the fact that the second part gives the existence 
of an attractor, no convergence result is obtained for the discrete process (4.4). The next result 
show that, under the assumption r//3 < 1, something more can be said for (APD). 

Proposition 5.15. If n(3 < 1, (4.3) has a unique rest point (x*, A*) G X x A which is symmetric in 
the sense that x* = (x, . . . , x) and A* = (A, . . . , A) = o-(x*). Furthermore, {(x*, A*)} is an attractor 
for (4.3) and F((x n , A n ) -)■ (x*, A*)) > 0. 

Proof. The existence and uniqueness of the symmetric rest point of (4.3) follows from Remark 4.1 
and Proposition 5.14. Lemma 5.16 below shows that the matrix V^x*, A*) is stable. Hence, 
{(x*,A*)} is an attractor for (4.3) and Proposition 5.11 applies. □ 

Lemma 5.16. If r](3 < 1, then the matrix V^ / (x*, A*) is stable. 

Proof. Recall that the matrix J' 3 = V x ^ x (x^, A*) is the upper-left block of the matrix V^x*, A*) 
(see (5.5)). Observe that, from the definition of the function *& x , the fact that a % depends only on 
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and (5.10), the entries of are given by 



J; 



dm 



ts, 3 r Q x j, 



■ (x* , A* ) 



d ( OA 



dx3 r V 07T ls 

dTT kr 'dir 



(a(x)) (ac*) - t{ is=jr } 



d 2 A da kr ' 

E E Fl^-kr'Ft^is ^*)~ft^h r ( X *> ~ 1 {is=jr} 



^ d 2 A da^ 

/3Af (1 - Af )E Aj , (c^r.+l - Ccn-. +2 )l{s=r,i^j} - l{is=jr}- 



(5.13) 



Since A* is symmetric (A ,r = X^ r , for all i,j G .A), J' 3 is a symmetric matrix. Let us show that 
is negative definite by modifying the trick used in Cominetti et al. [9, Proposition 12]. Take 
h G K^W^O}, then, from (5.13), 



h T JPh = [0Y,h*y/>t(l - Xf)h jr y Af (1 - Af )E A .(c&r +1 - c\jr. +2 ) - Y,(h ir ) 

r€R- ij^j i 



For every i £ A and r G 7£, put u* r = /t tr < / 1 . ^.* , Z* r = D ir X !r and set t/q = r\\ = 0. Therefore, 



reft L i^J 



„ir\2 



< 



£ E A , (/? £ - c^) - J]) 

reft ^ t^j i * ' 

E Ea * ( ~ ^ E z " ZJr - E( z ") 2 ) 

53 E A Y - ( 53 2 + (r^/3 - 1) 53(Z-) 2 ) < 0, 

reft ^ ^ i ' i ' 



where the last inequality follows by observing that rf ur < rj. 



□ 



Remark 5.17. In fact, part (ii) in Proposition 5.14 holds true if rj/3 < 2. The authors provide an 
explicit (local) Lyapunov function which does not seem suitable in our case. 



5.3 Nonconvergence 

In order to give an idea of the behavior of the stochastic process defined by (APD) when the 
parameter /3 (we assume = (3 for all i £ A) becomes large, we provide a small class of games 
which underlines the relevance of the hypotheses considered throughout this document. Consider a 
2-player symmetric game, i.e., the strategy set S = S 1 = S 2 is common for and the payoffs verify 
that G 1 = (G 2 ) T . Let us assume that G 1 has constant-sum by row, this is, ^ r G 1 (s, r) = k G R for 
every s G S. It is easy to check that for this kind of games there exists a rest point of (4.3) which 
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has the form (x, <r(x)) £ X x A such that x l = (l/k, . . . ,l/k) for i £ {1, 2}. We also assume that 
A game that satisfies the precedent conditions is the good (resp. bad) Rock-Scissors-Paper game 






a 


-b 


-b 





a 


a 


-b 






where < b < a (resp. < a < b) or the game (5.7). 

The (strong) hypotheses above ensure that at least one rest point of (4.3) does not depend on 
the parameter f3. In the following we will easily show that if the parameter /? is sufficiently large 
then the rest point (x,o~(x)) becomes linearly unstable. Later, we will prove that this implies that 
W((x n ,K) -> (x,<r(x)))=0. 

Lemma 5.18. If the parameter (3 > is sufficiently large, then there exists an eigenvalue /i of the 
matrix Vf (i, o~(x)) such that Re(/x) > 0. 



Proof. Again, let J@ = \7 x ^ x (x, o~{x)) be the upper-left block of the Jacobian matrix of the function 
^ , which is the only relevant part, evaluated at (x,a(x)). The precise expression for the entries of 
J^is 

1, if i = j and s = r 

0, if i = j and s ^ r 

^[G'ts.r)-^], otherwise, 



a 8^ i8 




^\S\ I" v-> ' \S\ 
i,j G {1,2}. Thus the matrix J@ has the form 

( ~f -I 

with r e Rl^l x R^l. Observe that we can decompose J 13 as J 13 = f3J — I, where 

J ~ \J 

Let hi, . . . ,fJ>\S\ £ C be the eigenvalues of J (counting multiplicity). Since we have assumed that 
G ?1 (s, s) / k, the trace of J is not zero. So that, there exists some eigenvalue [ik > k G {1, . . . IS*]}, 
with nonzero real part. We have that, if v is an eigenvector associated to ^ki then fik is an eigenvalue 
of J with corresponding eigenvector u = (v,v) G R' 5 ' x R' 5 ' since 

Ju = \ % nl (!) = (4!) =^ u - 



J J \v J \Jv / 

If ReOifc) > 0, the proof is finished. If Re(/x z ) < for all Z G {1, . . . \~S\] then Re(^) < 0. Also, 
the trace of J is zero and therefore there exists [i eigenvalue of J (which is not an eigenvalue of J) 
such that Re(/i) > 0. 



Finally, observe that 

H fi t (ft.r - ,,.t\ = 



det(pJ-nI) = -$=rdet(j-%I], (5.14) 
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and it is straightforward from (5.14) that fj. is an eigenvalue of the matrix J@ if \i = (1 + fi)//3 is an 
eigenvalue of J. Then 71 = — 1 whose real part is strictly positive for a sufficiently large /3. □ 

Proposition 5.19. There exists j3 > large enough and at least one rest point (x,o~(x)) G X x A 
of (4.3) such that, 

P((x n ,X n )^(x,a(x))) = 0. 

Proof. We can directly apply Brandiere and Duflo [6, Theorem 1]. The hypotheses of the theorem 
concerning the continuous dynamics and the step-size of the discrete process (APD) are immediately 
satisfied. The only condition that deserves attention is the one referred to the excitation of the noise 
on a repulsive direction at (x, o~(x)). Explicitly, it is sufficient to prove that 

liminf E(||e?+i || 2 | Tn) > a.s. on the event T(x,a(x)) = {(x n , X n ) — > (x, a(x))}, (5.15) 

since the noise term e n = (U n ,M n ) is almost surely bounded. Here the upper-script pr stands for 
the projection onto the repulsive subspace spanned by the eigenvectors associated to the eigenvalues 
with positive real part. 

Fix i G {1, 2}, take j3 large to have an eigenvalue \i of V^(x, a(x)) such that Re(^) > and let v a 
correspondent (eventually generalized) eigenvector. The vector v has the form v = (i>i,i>2)- Notice 
that necessarily U2 / since, if t>2 = 0, then v\ is a vector of ones, which is indeed an eigenvector 
for the upper-left block of V^(x,a(x)) having -1 as the associated eigenvalue. So that 

E(||C + iH 2 |^)>IE(|K £ n + i,v) V || 2 I T n ) 

> Mf +1 ) 2 | F n ) 

r 

>cE((M^ +1 ) 2 \F n ), 

with j = —i and for some r £ S and c > 0. 
In view of (4.2), 

E((M^ +1 ) 2 I *») = E (( 1 K +1 =r} - ^(4)) 2 I *0 + oQ) = ^ r (4)(l - a^{xi)) +o(~)- 

Finally, take the liminfn, in the previous expression on the event T(x,a(x)) to conclude that (5.15) 
holds, since a ts is bounded away from zero for every i G {1, 2} and s G S. □ 

Remark 5.20. For the game (5.7), if /3 > 3/2, then p(V^f(x,a(x))) > 0. Compare with Proposi- 
tions 5.2 and 5.11. 

As observed by Pemantle [21], the nonconvergence results like the previous proposition are not very 
interesting if the set of unstable points is too large. Particularly, the most useful consequences can 
be stated when this set is finite. This is the case of our example (5.3) and, moreover, it is easy to 
check that (x,a(x)) is the unique rest point of (4.3) for all f3 > 0. The previous result shows that 
for a large /3, (x,a(x)) has probability zero to be the limit of the process while for small f3 it is 
almost surely the limit. Simulations suggest that there is a cycle that attracts the trajectories and 
that the empirical frequencies of play still converge to a(x). Figure 2 shows the behavior of the 
procedure (specifically the evolution of the mixed action o~\ of player 1) when j3 is large. 



20 



However, it does not seem plausible to pursuit a result like Proposition 5.19 on a general class of 
games. For instance, consider the 2-player zero-sum game denned by the payoff 




(5.16) 



Let (x», v(x.)), with a l (xl) = a 2 (xl) = (1/(1+^), eP/il+eP)) and x\ = xl = (-^/(l+e*), 1/(1 + 
e")), be the unique rest point of (4.3). In this case, every eigenvalue of V^(x*, <r(x*)) is equal to 
-1. Then P((x n , A n ) — > (a;*, a(x*))) > for all /3 > due to Proposition 5.11. 




Figure 2: The mixed action a\ of Player 1 when f3 = 4. 



6 Random environment 

Our aim in this section is to consider the case where players receive, at each stage, a perturbed 
version of their payoffs. The general result given by Theorem 2.2 allows to add some perturbation 
to the process (APD) maintaining unaltered the results presented in this work. For example, we 
can consider that each player i G A get a payoff ~g l n = g l n + e l n at stage n, where (e^) n is a martingale 
difference process bounded in I? or a vanishing random variable (see Remark 2.3). In this section, 
we are interested in a different kind of perturbation. 

For the sequel, the model runs analogously as before. We only add that, at each time n £ N, each 
player i £ A receives a random payoff 5^ = G l (s n ,w n ) where the sequence {w n ) n is a controlled 
(by the parameter A £ A) Markov chain with finite state space W, i.e., there exists a family of 
transition matrices (P(A))aga where 

F(w n+1 =w\J 7 n ) = P( Wn;W )(X n ), 
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for w G W with A n determined by the learning process and T n being the a-algebra generated by 
the history (si, gi,w\, . . . ,s n , g n ,w n ) up to time n. We assume the following. 



(Al) For any A G A the Markov chain with transition matrix -P(A) has a unique invariant probability 
r(A) G A w , where A w denotes the set of probabilities over W. 

(A2) The function P : A — > M (the set of stochastic matrices of dimension \W\ x \W\), A — > P(X) 
is of class C 1 . 

Remark 6.1. The hypothesis above imply that the function r is also of class C 1 (see Benai'm [2, 
Lemma 3.2]). 



Observe that the unique recurrent class of the associated Markov chain may be periodic. Note also 
that the process (s n ,w n ) n is also a controlled (by the parameter (x,A) G X x A) Markov chain 
with state space S x W. The independence hypothesis implies that for a given (x, A) the associated 
transition matrix is given by 



^(a,b)(^A) = n^ rl (^) P (^)( A )' 



for each (a, b) = ((s,u>), (r,u/)) G (S x W) 2 . From (Al), the Markov chain with transition matrix 
P(x, A) has a unique invariant probability r(x, A), where 



N 



r a (x,X) = l[a is \x i )T w (\), 



(6.1) 



i=i 



for a = (s, w) G S x W . 



Let us now state precisely the corresponding discrete process in this framework. We conserve the 
notation and the hypotheses (A) over the decision rule of players a. For each player i G A, we 
define the new payoff function G % : S x W — > R and its multilinear extension to A x A w as usual. 
Therefore the updating rule is this case is given by 



1 



x 



oi.J X n + aii, 9 



n+1 



1 



lis 



otherwise, 



(APM) 



with g % n = G l (s n ,w n ). As before, we can conveniently recast the process (APM) as 



where 



1 



x 



n+l 



\is \ i 

A n+1 ~ \ 



n + l 
1 

n + l 



^(xj,)G i ( S ,a- i (x n ),r(A n )) , U 



+ 



ts 

n+l 



a-(4)-Atf + K+i 



u. 



n+l = {91+1 - X n) 1 {s=si +1 } ~ (^(4)^(5,(7 i (x n ),r(A n )) - X%) , 



M l n s +l = t {s=si+i} - a ts (x l n ) + b n+1 , 



(APDM) 



(6.2) 



and b n = O(^). Observe that the process (U n ) n is not a martingale difference sequence. The next 
result shows that the analogous to Proposition 4.5 holds. 
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Proposition 6.2. The process (APDM) converges almost surely to an ICT set of the continuous 
dynamics 



G\s,a-\x t ),r{X))-x\ 



¥ x (x,A), 



r A a (x,A). 



(6.3) 



In the sequel, our analysis will rely on the ideas in the proof of Benaim [2, Proposition 3.3] (which are 
indeed a reduction to the finite state space case of the general framework developed in Benveniste 
et al. [4, Part II]) to show that hypothesis (c) of Theorem 2.2 holds (since assumptions (a) and (6) 
are immediately satisfied). 



Fix i € {1, . . . , N} and (x, A) G X x A. Let us define the matrices H l (x, A) and W(x, A) by 

G l (s,w) — x ls \ if s l = r l , 



*W^A) 



0, 



otherwise, 



(6.4) 



and 



W (a>b )(x, A) = r h (x, A), (6.5) 

for every a = (s, w),b = (r, it/) G S x W and Tb(x, A) given by equation (6.1). Notice that, for 
every s G S\ a n+1 = (s n+1 ,w n+ i) and b n+ i = ((s, s"^), w n+ i) , the following equality holds 

H (a n+1 ,b n+1 )( x n> = (C' (s, S^^Wn+i) ~ ) 1 {s=sj l+1 }! 

and 

[W(x„,A„))H i (x n ,A„)] (ari+iibn+i) = ^ r a (x n ,A n )(G i (r,u;)-< rI )l {s=r i } 

a=(r,iu) 

= o" is 04)( ^r^(A n )G i (s,o-~ i (x n ),w) - < 



ff"(4) G i (s,cr" i (x n ),r(A n )) - xjf ), 



so that 

[(/ - W(x n , A n ))H> n , X n )) K+1)bn+l) = Ch- 
Lemma 6.3. There exists a C 1 function Q l (x, A) such that 

(I - P(x, A))Q i (x, A) = (J — W(x, A))ff(x, A). 

Outline of the proof. The required function is given by Q' l (x, X) = E(x, A)H*(x, A) where 



(6.6) 
(6.7) 



-\-oo 

E(x, A) = J (E t (x, X) - W(x, X))dt, 



with Et(x, X) being the matrix solution of the linear differential equation 

d 



dt 



El(y)=-(I-P(y))Et(y), 



my) =i- 

See Benaim [2, Lemma 5.1] (and the previous discussion therein) for further details. 



□ 
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Proof of Proposition 6.2. To prove that the third assumption in the statement of Theorem 2.2 is 
satisfied almost surely, we have to show that the noise term (e n ) n = (U n , M n ) n , where U™ = /X™ 
(see equation (6.2)) satisfies that 



sup 



, fc-1 n 1 «. 

£_e I+1 ; A;G{n+l,...,m(^-+T)} ^ 0, 

^ l=n j=l J J 



(6. 



as n — > +oo, for every T > 0. Since (M n ) n is an almost surely bounded martingale difference 
process plus a vanishing term, it is sufficient to show that (6.8) holds for (U n ) n . For a matrix B (of 
appropriate dimension) and a fixed s £ 5 ! , let us set by B[n] the operation that consists on taking 
the (a n ,b n ) entry of £>, where a n = (s n ,w n ) and b n = ((s, s~ l ), w n ) . Now, according to (6.6) and 
Lemma 6.3, we have, for every n > 1, 



1 



ra + 1 ~(n + l)A£ 
1 



-((I - W(x n , A n ))ff(x n , A n )) [n + 1] 



Q i (x n ,A n )[n + l] 



:<+!(!) + <+i(2) + <+i(3) + <+i(4), 



P(x n ,A n )Q*(2; n , A n )[n + 1] 



where 



<+i(2) 



<+i(3) 
<+i(4) 



1 



: Q i (a; n ,A n )[n + 1] 



-P\%ni X n )Q l (x n , A n )[n], 



(n + ljA"" v J (n + l)A- 

P(x n , A n )Q*(x n , A n )[n] -r— P(x n , A„)Q l (a; n , A n )[n] 



(n + l)A 
1 



nA 
1 



n-l 



nX 



n-l 



P(x n , X n )Q l (x n , X n )[n] - - — P(x n +i, X n+1 )Q l (x n+1 , X n+ i)[n + 1], and 

(n + l)AJ? 



P(x n+ i,A n+ i)Q l (x ri+ i,A n+ i)[n + 1] + 



(n + l)A*f 



P(x n , A n )Q J (x n , A n )[n + 1]. 



(n + l)A 



Note that, almost surely, ||u^ +1 (2)|| < C/n 2 and X^z=n ^z+iC^) — C/n since the functions 

P,Q l , and the process (A n ) n are bounded (for a generic positive constant C). We also have that 
||u" +1 (4)|| < ||(x n +i, A n+ i) — x n , X n \\ ■ C/n < C/n 2 because of the smoothness of PQ l and the com- 
pactness of X x A and that (J^ILi u z+i(l))n is a convergent martingale (since II 111^(1) II < C/l 2 ). 
So that 



k-l ^ 



fc-1 



I>m(i) 



l=n 



+ 0[- 

n 



which implies that (6.8) holds. 



□ 



Once we have the connection between the discrete process (APDM) and the continuous dynamics 
(6.3), the task of finding explicit almost sure convergence results becomes more difficult because of 
the general form of the function P defined in (A2). However, it is possible to state an analog of 
Proposition 4.5 in the Logit decision rule case by strengthening the conditions over the parameters 
(3i and assuming that the variation of payoffs is small. Recall that a = maxj ej 4 Ylj^i Pj ■ 
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Proposition 6.4. Under the Logit decision rule (5.1), let rj w be the maximal unilateral deviation 
that a player can face for a fixed w G W (see (5.3)). Set fj = max w r\ w and fj = maxj s u , jU ,/ |G*(s, w) — 

G'(s,^)l- If 



2rja + fjy/\W\k < 1, 
2 maxj Pi < 1 , 



(6.9) 



where k is the W-W^-Lipschitz constant for the function r, then the discrete process (APDM) con- 
verges almost surely to the global attractor {(z*, A*)} for the dynamics (6.3). 

Proof. It is not difficult to see that under condition (6.9) there exists a unique rest point (x*, A*) G 
X x A of (6.3). Consider the function V defined by V(x,A) = max{||x — x*!!^ , ||A — A*^}. 
Now if T^(xt,At) = \\Xt — A^Hoo > |[a?t — a;*|| 00 then y(x 4 ,At) < —KV(xt, \t) for some -K" > 0, 
using exactly the same computations as in Proposition 4. 5, the fact that 2maxj/3j < 1 and that 
||<j iS (x) — <7 iS (y)|| < 2Pi \\x — vWqq for alii G A and s G S l (see e.g., Cominetti et al. [9, Proposi- 
tion 5]). On the other hand, if V(x t , At) = \\xt ~ x *\\oo — 11^ ~~ ^*lloo> assume that the maximum is 
attained on the is coordinate, that x ts > x™ and define F is (x,t) = G % (s , o~ % (x) , r) . So that 



XI 



x? + F ls (x t ,r(X t )) - F is (x*,T(\ t ))+ 



< 



< 



+ F ts (x*,T(\ t ))-F is (x*,T(\*)) 
a ls (x\) 



XI 



x t ~ x *\\oo + 2r l a W X t - x *\\oo + V \\ T (^t) - t(X 



#J\\l 



^(l-2qa-r}y/\W\k)V(xt,Xt), 



for some £ > such that a ts (x) > £ for every i G A and s £ S' 1 . Notice that we have used the fact 
that, for every w' G W, 

F is (x*,r(X t )) - F is (x*,T(X*)) = M X t) ~ T w (X*))(F is (x*,w) - F is (x*,w')), 

<^||r(A t )-r(A*)|| 1 

< vVW\ \\r(Xt) ~ r(A,)|| DO < f,^\W\k \\X t - XA^ ■ 



Again, if x\ s — x* s < 0, the computations are the same. Then, V is a strict Lyapunov function. □ 



The constant case 

We restrict our attention to the constant case, i.e., the function P is such that P(X) = P* and 
consequently t(A) = r* where r* is the unique invariant measure of P*. Therefore the condition 
to ensure convergence in Proposition 6.4 reduces to 2rja < 1 (by considering the same Lyapunov 
function as in Proposition 4.5). 

Notice that, in this case, V x ty x (x*, A*) is an average (by r*) of the Jacobian matrices V x 1 if(x lt: , X*,w), 
where 

*?(s,A,«0 = —£-±[G l (x,o--\x),w)-x ls ], 



25 



for every w G W. Then the estimate given by (5.4) can be obtained exactly in the same manner 
than in Lemma 5.3 for the function if 2fja < 1. Hence, as in the deterministic environment case, 
the matrix V^x, A) is stable for every rest point (x, A) of (6.3) if 1 < 2r)a < 2 and Proposition 5.11 
can also be recovered in this framework. Let y be the set of rest points of the dynamics (6.3). 

Proposition 6.5. If an attractor A for the dynamics (6.3) verifies that B(A) (1 ^ ^ 0, then 
P(£(x n , A n ) C A) > 0. In particular, under the Logit decision rule decision (5.1), if 1 < 2rja < 2, 
then the set y reduces to one point (x*, A*) and P((x n , A n ) — > (x*,A*)) > 0. 

Proof. We only have to check that the conclusion of Lemma 5.10 (regarding the attainability points 
of the process (APDM)) is satisfied in this case, since the remaining assumptions to apply [3, 
Theorem 7.3] hold just as before. For the sake of simplicity, let us suppose that is rational for 
all w G W. Let (x,A) be such that x is = G*(s,A _i ,r*) for all i G A and s G S\ Take m G N 
large such that, for every w G W, mr^ G N. Then, for a fixed w, repeat the block-argument in the 
proof of Lemma 5.10 to construct a w-block of play. Then consider the block of play that consists 
on playing mr^-times each w-block, for each w. This sequence generated by (APDM) carries the 
process (x n , A n ) n close to (x, A). □ 

Remark 6.6. In fact, without the constantness assumption, we have shown that for every (x, A) G 
X x A such that x ls = G l {s, o~ l (x), r(A)) for all i G A and s G <S\ the point (x, A) is attainable 
by (APDM). Hence, in particular, any rest point of (6.3) is attainable and the first part in the 
statement of Proposition 6.5 holds only under condition (A) on the decision rule a. 
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